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"VIDEO ENCODING METHOD AND CORRESPONDING ENCODING AND DECODING DEVICES" 

FIELD OF THE INVENTION 

The present Invention relates to the field of vfdeo compression and, for instance, to 
the vfdeo coding standards of the MPEG family (MPEG-1, MPEG-2, MPEG-4) and to the video 
recommendations of the ITU-H.26X family (H.261, H.263 and extensions, H.264). More specifically, 
this invention concerns an encoding method applied to an input video sequence corresponding to 
successive scenes subdivided into successive video object planes (VOPs) and generating, for coding 
all the video objects of said scenes, a coded bltstream constituted of encoded video data in which 
each data item is described by means of a bitstream syntax allowing to recognize and decode all 
the elements of the content of said bltstream, said content being described in terms of separate 
channels. 

The invention also relates to a corresponding encoding device, to a transmittable 
video signal consisting of a coded bitstream generated by such an encoding device, and to a 
method and a device for decoding a video signal consisting of such a coded bitstream. 

BACKGROUND OF THE INVENTION 

In the first video coding standards and recommendations (up to MPEG-4 and H.264), 
the video was assumed to be rectangular and to be described in terms of a luminance channel and 
two chrominance channels. With MPEG-4, an additional channel carrying shape information has 
been introduced. Two modes are available to compress those channels : the INTRA mode, where 
each channel Is encoded by exploiting the spatial redundancy of the pixels in a given channel of a 
single image, and the INTER mode, exploiting the temporal redundancy between separate images. 
The INTER mode relies on a motion-compensation technique, which describes an image from one 
(or more) previously decoded image(s) by encoding the motion of pixels from one Image to the 
other. Usually, the image to be encoded is partitioned into independent blocks, each of them being 
assigned a motion vector. A prediction of the Image is then constructed by displacing pixel blocks 
from the reference fmage(s) according to the set of motion vectors (luminance and chrominance 
channels share the same motion description). Finally, the difference (called the residual signal) 
between the image to be encoded and its motion-compensated prediction is encoded in the INTER 
mode to further refine the decoded image. 

The fact that all pixel channels are described by the same motion information is a 
limitation damaging the compression efficiency of the video coding system. In fact, several 
situations where the encoding of a motion vector set for all channels is not necessary can be 
Identified. For instance, in sequences where there is little motion between successive frames, 
instead of encoding a full set of motion vectors repeating that each macroblock has no motion, it 
may be advantageous to signal that no motion is present In other situations, instead of encoding a 
motion vector field, it may be advantageous to signal that the prediction of the motion vectors 
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should be constructed by interpolating the image from several reference images (fn this case, the 
decoder has to estimate a motion field between several reference images and interpolate it to 
create the prediction of the current image), or a motion vector field can still be interpreted not 
directly from one or several reference image(s), but instead from the temporal interpolation of the 
reference images. 

There are still some situations where the way of constructing the temporal prediction 
can be switched on a channel or macroblock basis. A first example is a sequence with a shape 
channel : it is possible that the shape information does not change much, whereas the luminance 
and chrominance channels carry varying information (it is for instance the case with a video 
depicting a rotating planet : the shape is always a disc, but the texture of it depends on the planet 
rotation). In this situation, the shape channel can be recovered by temporal copy, and the 
luminance and chrominance channels by motion compensated temporal interpolation. A second 
example is the case of a change at the macroblock level. In a video sequence showing a seascape 
with the sky in the upper part of the picture, unlike the sea, the sky remains the same from one 
image to the other. Its macroblocks can therefore be encoded by temporal copy, whereas the 
macroblocks of the sea have to be encoded by temporal interpolation. 



SUMMARY OF THE INVENTION 

It is therefore the object of the invention to propose a video encoding method in 
which it is provided to adapt the way the temporal prediction is formed. 

To this end, the invention relates to a method such as defined in the introductory part 
of the description and which is moreover characterized in that said syntax comprises an additional 
syntactic element provided for describing independently the type of temporal prediction of the 
various channels, said predictions being chosen within a list comprising the following situations : 

- the temporal prediction is formed by directly applying the motion field sent by the 
encoder on one or more reference pictures ; 

- the temporal prediction is a copy of a reference image ; 

- the temporal prediction is formed by the temporal interpolation of the motion field ; 

- the temporal prediction is formed by the temporal interpolation of the current motio 
field and further refined by the motion field sent by the encoder. 

The invention also relates to a corresponding encoding device, to a transmittable 
video signal consisting of a coded bitstream generated by such an encoding device, and to a 
method and a device for decoding a video signal consisting of such a coded bitstream. 

DETAILED DESCRIPTION OF THE INVENTION 

According to the invention, it is proposed to introduce in the encoding syntax used by 
the video standards and recommendations a new syntactic element supporting their lack of 
flexibility and opening new possibilities to encode more efficiently and independently the temporal 
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prediction of various channels. This additional syntactic element, called for instance "channel 
temporal prediction", takes the following symbolic values : 

Motion_compensation 

TemporaLcopy 

Tern poraLinterpolation 

Motion_a)mpensatedJtemporaLinterpolation, 
and the meaning of these values Is : 

a) motlon_compensation : the temporal prediction is formed by directly applying the 
motion field sent by the encoder on one or more reference pictures (this default mode is implicitly 
the INTER coding mode of most of the current coding systems) ; 

b) temporaLcopy : the temporal prediction is a copy of a reference image ; 

c) temporaljnterpolation : the temporal prediction is formed by the temporal 
interpolation of the motion fields ; 

d) motton_compensatedjemporaljnterpolatlon : the temporal prediction is formed by 
the temporal interpolation of the current motion field and further refined by the motion field sent 
by the encoder. 

The words "temporal interpolation", must be understood in a broad sense, i.e. as 
meaning any operation of the type defined by an expression such as Vnew = a.Vpast + b.Vfuture 
+ K, where Vpast designates one (at least one) previous motion field, Vfuture designates one (at 
least one) future motion field, a and b designate coefficients respectively assigned to said past and 
future motion fields, K designates an offset and Vnew is the new motion field thus obtained. It can 
therefore be seen that, In fact, the particular case of the temporal copy is included in the more 
general case of the temporal interpolation, for b = 0 and K « 0 (or a = 0 and K = 0). 

The additional syntactic element thus proposed is expected to be placed at various 
levels in the coded bitstream that has to be stored, or transmitted to the decoding side ; 

1) at the image level (or VOP level in MPEG-4 terminology), one syntactic element 
being placed in an INTER picture (Its meaning is then shared by all the channels present in the 
VOP); 

2) at the image level (or VOP level), with a syntactic element for each present 

channel ; 

3) at the macroblock level, with moreover a specific syntactic element for each present 

channel. 



4 

PHFR030035 EPp 

CLAIMS : 

1. An encoding method applied to an input video sequence corresponding to successive 

scenes subdivided Into successive video object planes (VOPs) and generating, for coding ail the 
video objects of said scenes, a coded bitstream constituted of encoded video data in which each 
data item is described by means of a bitstream syntax allowing to recognize and decode all the 
elements of the content of said bitstream, said content being described in terms of separate 
channels, said method being further characterized in that said syntax comprises an additional 
syntactic element provided for describing independently the type of temporal prediction of the 
various channels, said predictions being chosen within a list comprising the following situations : 

- the temporal prediction is formed by directly applying the motion field sent by the 
encoder on one or more reference pictures ; 

- the temporal prediction Is a copy of a reference image ; 

- the temporal prediction is formed by the temporal interpolation of the motion field ; 

- the temporal prediction Is formed by the temporal interpolation of the current motion 
field and further refined by the motion field sent by the encoder. 

2. An encoding method according to claim 1, characterized in that said additional 
syntactic element Is placed at the Image level In said generated coded bitstream and its meaning is 
shared by all existing channels. 

3. An encoding device processing an Input video sequence that corresponds to 
successive scenes subdivided into successive video object planes (VOPs) and generating, for coding 
all the video objects of said scenes, a coded bitstream constituted of encoded video data in which 
each data item is described by means of a bitstream syntax allowing to recognize and decode all 
the elements of the content of said bitstream, said content being described in terms of separate 
channels, said encoding device being provided for carrying out the encoding method according to 
anyone of claims 1 and 2. 

4. A transmittable video signal consisting of a coded bitstream generated by an encoding 
device processing an input video sequence that corresponds to successive scenes subdivided into 
successive video object planes (VOPs) and generating, for coding all the video objects of said 
scenes, a coded bitstream constituted of encoded video data in which each data item is described 
by means of a bitstream syntax allowing to recognize and decode all the elements of the content of 
said bitstream, said content being described in terms of separate channels, said transmittable video 
signal Including an additional syntactic element provided for describing independently the type of 
temporal prediction of the various channels, said predictions being chosen within a list comprising 
the following situations : 

- the temporal prediction is formed by directly applying the motion field sent by the 
encoder on one or more reference pictures ; 

- the temporal prediction Is a copy of a reference image ; 

- the temporal prediction is formed by the temporal Interpolation of the motion field ; 

- the temporal prediction is formed by the temporal Interpolation of the current 
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motion field and further refined by the motion field sent by the encoder. 
5. A method for decoding a transmittable video signal consisting of a coded bitstream 

generated by an encoding device processing an input video sequence that corresponds to 
successive scenes subdivided into successive video object planes (VOPs) and generating, for coding 
all the video objects of said scenes, a coded bitstream constituted of encoded video data in which 
each data item is described by means of a bitstream syntax allowing to recognize and decode all 
the elements of the content of said bitstream, said content being described in terms of separate 
channels, said transmittable video signal including an additional syntactic element provided for 
describing independently the type of temporal prediction of the various channels, said predictions 
being chosen within a list comprising the following situations : 

- the temporal prediction is formed by directly applying the motion field sent by the 
encoder on one or more reference pictures ; 

- the temporal prediction is a copy of a reference Image ; 

- the temporal prediction is formed by the temporal interpolation of the motion field ; 

- the temporal prediction is formed by the temporal interpolation of the current 
motion field and further refined by the motion field sent by the encoder. 

6. A decoding device for carrying out a decoding method according to claim 5. 
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Abstract 

The invention relates to an encoding method applied to an input video sequence 
corresponding to successive scenes subdivided into successive video object planes (VOPs) and 
generating, for coding all the video objects of said scenes, a coded bitstream constituted of 
5 encoded video data in which each data item is described by means of a bitstream syntax allowing 

to recognize and decode all the elements of the content of said bitstream. According to said 
method, said syntax comprises an additional syntactic element provided for describing 
independently the type of temporal prediction of the various channels, said additional syntactic 
element being placed at the Image level in the coded bitstream and shared by all existing channels. 
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